geometry-inspired unified mdp framework
Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
Mustafin, Arsenii, Sheng, Xinyi, Baumann, Dominik
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.